===========================================================
Energy cost bindings for Energy Aware Scheduling
===========================================================

===========================================================
1 - Introduction
===========================================================

This note specifies bindings required for energy-aware scheduling
(EAS)[1]. Historically, the scheduler's primary objective has been
performance.  EAS aims to provide an alternative objective - energy
efficiency. EAS relies on a simple platform energy cost model to
guide scheduling decisions.  The model only considers the CPU
subsystem.

This note is aligned with the definition of the layout of physical
CPUs in the system as described in the ARM topology binding
description [2]. The concept is applicable to any system so long as
the cost model data is provided for those processing elements in
that system's topology that EAS is required to service.

Processing elements refer to hardware threads, CPUs and clusters of
related CPUs in increasing order of hierarchy.

EAS requires two key cost metrics - busy costs and idle costs. Busy
costs comprise of a list of compute capacities for the processing
element in question and the corresponding power consumption at that
capacity.  Idle costs comprise of a list of power consumption values
for each idle state [C-state] that the processing element supports.
For a detailed description of these metrics, their derivation and
their use see [3].

These cost metrics are required for processing elements in all
scheduling domain levels that EAS is required to service.

===========================================================
2 - energy-costs node
===========================================================

Energy costs for the processing elements in scheduling domains that
EAS is required to service are defined in the energy-costs node
which acts as a container for the actual per processing element cost
nodes. A single energy-costs node is required for a given system.

- energy-costs node

	Usage: Required

	Description: The energy-costs node is a container node and
	it's sub-nodes describe costs for each processing element at
	all scheduling domain levels that EAS is required to
	service.

	Node name must be "energy-costs".

	The energy-costs node's parent node must be the cpus node.

	The energy-costs node's child nodes can be:

	- one or more cost nodes.

	Any other configuration is considered invalid.

The energy-costs node can only contain a single type of child node
whose bindings are described in paragraph 4.

===========================================================
3 - energy-costs node child nodes naming convention
===========================================================

energy-costs child nodes must follow a naming convention where the
node name must be "thread-costN", "core-costN", "cluster-costN"
depending on whether the costs in the node are for a thread, core or
cluster.  N (where N = {0, 1, ...}) is the node number and has no
bearing to the OS' logical thread, core or cluster index.

===========================================================
4 - cost node bindings
===========================================================

Bindings for cost nodes are defined as follows:

- system-cost node

	Description: Optional. Must be declared within an energy-costs
	node. A system should contain no more than one system-cost node.

	Systems with no modelled system cost should not provide this
	node.

	The system-cost node name must be "system-costN" as
	described in 3 above.

	A system-cost node must be a leaf node with no children.

	Properties for system-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

- cluster-cost node

	Description: must be declared within an energy-costs node. A
	system can contain multiple clusters and each cluster
	serviced by EAS	must have a corresponding cluster-costs
	node.

	The cluster-cost node name must be "cluster-costN" as
	described in 3 above.

	A cluster-cost node must be a leaf node with no children.

	Properties for cluster-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

- core-cost node

	Description: must be declared within an energy-costs node. A
	system can contain multiple cores and each core serviced by
	EAS must have a corresponding core-cost node.

	The core-cost node name must be "core-costN" as described in
	3 above.

	A core-cost node must be a leaf node with no children.

	Properties for core-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

- thread-cost node

	Description: must be declared within an energy-costs node. A
	system can contain cores with multiple hardware threads and
	each thread serviced by EAS must have a corresponding
	thread-cost node.

	The core-cost node name must be "core-costN" as described in
	3 above.

	A core-cost node must be a leaf node with no children.

	Properties for thread-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

===========================================================
5 - Cost node properties
==========================================================

All cost node types must have only the following properties:

- busy-cost-data

	Usage: required
	Value type: An array of 2-item tuples. Each item is of type
	u32.
	Definition: The first item in the tuple is the capacity
	value as described in [3]. The second item in the tuple is
	the energy cost value as described in [3].

- idle-cost-data

	Usage: required
	Value type: An array of 1-item tuples. The item is of type
	u32.
	Definition: The item in the tuple is the energy cost value
	as described in [3].

===========================================================
4 - Extensions to the cpu node
===========================================================

The cpu node is extended with a property that establishes the
connection between the processing element represented by the cpu
node and the cost-nodes associated with this processing element.

The connection is expressed in line with the topological hierarchy
that this processing element belongs to starting with the level in
the hierarchy that this processing element itself belongs to through
to the highest level that EAS is required to service.  The
connection cannot be sparse and must be contiguous from the
processing element's level through to the highest desired level. The
highest desired level must be the same for all processing elements.

Example: Given that a cpu node may represent a thread that is a part
of a core, this property may contain multiple elements which
associate the thread with cost nodes describing the costs for the
thread itself, the core the thread belongs to, the cluster the core
belongs to and so on. The elements must be ordered from the lowest
level nodes to the highest desired level that EAS must service. The
highest desired level must be the same for all cpu nodes. The
elements must not be sparse: there must be elements for the current
thread, the next level of hierarchy (core) and so on without any
'holes'.

Example: Given that a cpu node may represent a core that is a part
of a cluster of related cpus this property may contain multiple
elements which associate the core with cost nodes describing the
costs for the core itself, the cluster the core belongs to and so
on. The elements must be ordered from the lowest level nodes to the
highest desired level that EAS must service. The highest desired
level must be the same for all cpu nodes. The elements must not be
sparse: there must be elements for the current thread, the next
level of hierarchy (core) and so on without any 'holes'.

If the system comprises of hierarchical clusters of clusters, this
property will contain multiple associations with the relevant number
of cluster elements in hierarchical order.

Property added to the cpu node:

- sched-energy-costs

	Usage: required
	Value type: List of phandles
	Definition: a list of phandles to specific cost nodes in the
	energy-costs parent node that correspond to the processing
	element represented by this cpu node in hierarchical order
	of topology.

	The order of phandles in the list is significant. The first
	phandle is to the current processing element's own cost
	node.  Subsequent phandles are to higher hierarchical level
	cost nodes up until the maximum level that EAS is to
	service.

	All cpu nodes must have the same highest level cost node.

	The phandle list must not be sparsely populated with handles
	to non-contiguous hierarchical levels. See commentary above
	for clarity.

	Any other configuration is invalid.

- freq-energy-model
	Description: Optional. Must be declared if the energy model
	represents frequency/power values. If absent, energy model is
	by default considered as capacity/power.

===========================================================
5 - Example dts
===========================================================

Example 1 (ARM 64-bit, 6-cpu system, two clusters of cpus, one
cluster of 2 Cortex-A57 cpus, one cluster of 4 Cortex-A53 cpus):

cpus {
	#address-cells = <2>;
	#size-cells = <0>;
	.
	.
	.
	A57_0: cpu@0 {
		compatible = "arm,cortex-a57","arm,armv8";
		reg = <0x0 0x0>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A57_L2>;
		clocks = <&scpi_dvfs 0>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
	};

	A57_1: cpu@1 {
		compatible = "arm,cortex-a57","arm,armv8";
		reg = <0x0 0x1>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A57_L2>;
		clocks = <&scpi_dvfs 0>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
	};

	A53_0: cpu@100 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x100>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_1: cpu@101 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x101>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_2: cpu@102 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x102>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_3: cpu@103 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x103>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	energy-costs {
		CPU_COST_0: core-cost0 {
			busy-cost-data = <
				417   168
				579   251
				744   359
				883   479
				1024  616
			>;
			idle-cost-data = <
				15
				0
			>;
		};
		CPU_COST_1: core-cost1 {
			busy-cost-data = <
				235 33
				302 46
				368 61
				406 76
				447 93
			>;
			idle-cost-data = <
				6
				0
			>;
		};
		CLUSTER_COST_0: cluster-cost0 {
			busy-cost-data = <
				417   24
				579   32
				744   43
				883   49
				1024  64
			>;
			idle-cost-data = <
				65
				24
			>;
		};
		CLUSTER_COST_1: cluster-cost1 {
			busy-cost-data = <
				235 26
				303 30
				368 39
				406 47
				447 57
			>;
			idle-cost-data = <
				56
				17
			>;
		};
	};
};

===============================================================================
[1] https://lkml.org/lkml/2015/5/12/728
[2] Documentation/devicetree/bindings/topology.txt
[3] Documentation/scheduler/sched-energy.txt
